EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining

نویسندگان

  • Souleymane Zida
  • Philippe Fournier-Viger
  • Chun-Wei Lin
  • Cheng-Wei Wu
  • Vincent S. Tseng
چکیده

High-utility itemset mining (HUIM) is an important data mining task with wide applications. In this paper, we propose a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discovers high-utility itemsets both in terms of execution time and memory. EFIM relies on two upper-bounds named sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper-bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster and consumes up to eight times less memory than the state-of-art algorithms dHUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

An empirical evaluation of high utility itemset mining algorithms

High utility itemset mining (HUIM) has emerged as an important research topic in data mining, with applications to retail-market data analysis, stock market prediction, and recommender systems, etc. However, there are very few empirical studies that systematically compare the performance of state-of-the-art HUIM algorithms. In this paper, we present an experimental evaluation on 10 major HUIM a...

متن کامل

Mining Correlated High-Utility Itemsets Using the Bond Measure

High-utility itemset mining is the task of finding the sets of items that yield a high utility (e.g. profit) in quantitative transaction databases. An important limitation of previous work on high-utility itemset mining is that utility is generally used as the sole criterion for assessing the interestingness of patterns. This leads to finding many itemsets that have a high profit but contain it...

متن کامل

A Fast Algorithm for Mining Utility-Frequent Itemsets

Utility-based data mining is a new research area interested in all types of utility factors in data mining processes and targeted at incorporating utility considerations in both predictive and descriptive data mining tasks. High utility itemset mining is a research area of utilitybased descriptive data mining, aimed at finding itemsets that contribute most to the total utility. A specialized fo...

متن کامل

An Efficient Data Structure for Fast Mining High Utility Itemsets

Abstract: High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, there seems to be still a lack of efficient algorithms since these algorithms suffer from either the problem of low efficiency of calculating candidates’ utilities or the proble...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015